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Method and device for generating and detecting fingerprints for synchronizing audio and 
video 



The present invention relates to synchronisation between at least two signals. 
More specifically, the invention relates to a method, and a corresponding device, of 
synchronising a first signal, e.g. an audio signal, and a second signal, e.g. a video signal. The 
invention also relates to a method, and a corresponding device, of enabling synchronisation 
5 of an audio signal and a video signal. Further, the invention relates to a computer readable 
medium having stored thereon instructions for causing one or more processing units to 
execute the method according to the invention. 

10 Synchronisation of a video stream with a corresponding audio stream is a 

difficult problem which has received a lot of attention. Many solutions to this problem have 
been proposed and implemented. Most of these solutions require manual synchronisation by 
a skilled operator. Typically the operator looks for visual clues within the picture to 
determine if the sound heard corresponds to the picture and that they are indeed synchronous. 

15 The problem becomes much harder when the synchronisation needs to be done automatically. 
This is a problem that is becoming more and more relevant, as processing and distribution of 
audio and video signals are becoming ever more complicated, both inside and outside a 
studio environment. An example of the latter is the following: A consumer records a movie 
with his video recorder. He would like to view it with the original sound-track. Therefore he 

20 buys the original soundtrack, which, for example, is streamed to him over the Internet. Now 
the audio and the video need to be synchronised automatically e.g. in/by his video recorder or 
another synchronisation device. 

One previous system that would allow the automatic synchronisation of an 
audio and a video stream is marketed by Tektronix. In this system, the envelope of the audio 

25 signal is embedded by means of a watermark into the video signal. At any point in the 

distribution of processing chain, the actual audio envelope can be compared to the embedded 
one, from which the delay between the two streams can be derived. Subsequently, the delay 
of the audio is corrected to achieve synchronisation. However, this system requires the co- 
operation of the broadcaster or another distributor, because before transmission, the 
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watermark needs to be embedded in the video. Further, this system can only associate one 
particular audio stream to the video. Once the envelope of an audio stream has been 
embedded, the system can only synchronise the video with that particular audio stream. For 
synchronising other audio streams, another watermark should have been embedded. Finally, 
5 the system is restricted to synchronisation between an audio stream and a video stream. 



It is an object of the invention to provide a method and corresponding device 
for generating a first and a second fingerprint usable for synchronisation of at least two 

10 signals and corresponding method and device for synchronising two or more signals that 
solves the above-mentioned problems. A further object is to provide this in a simple and 
efficient way. Another object is to enable simple, reliable and accurate localisation of a given 
part of a multimedia signal. A further object is to enable automatic synchronisation between a 
first signal and at least a second signal without modifying any of the signals 

15 This is achieved by a method (and corresponding device) of enabling 

synchronisation of a first and a second signal, the method comprising the steps of 

- deriving a first fingerprint on the basis of a segment of the first signal, where the 
segment of the first signal is unambiguously related with a first synchronisation time 
point, 

20 - deriving a second fingerprint on the basis of a segment of the second signal, where the 

segment of the second signal is unambiguously related with a second synchronisation 
time point, and 

supplying the first and second fingerprints to a synchronisation device 
and by a method (and corresponding device) of synchronising two or more signals, the 
25 method comprising the steps of: 

- generating a first fingerprint stream on the basis of a first signal, 

- generating a second fingerprint stream on the basis of a second signal, 

- comparing a segment of the first fingerprint stream with one or more first fingerprints 
stored in at least one database in order to determine if a match exists or not, 

30 - comparing a segment of the second fingerprint stream with one or more second 

fingerprints stored in the at least one database in order to determine if a match exists 
or not, and 

- if a match exists for both a first and a second fingerprint determining a location of a 
first synchronisation time point for the first signal and a location of a second 
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synchronisation time point for the second signal and synchronising the first and the 
second signal using the determined locations. 

In this way, a simple, reliable and efficient way of synchronising at least two 
signals is obtained. Further, this is enabled without modifying either the first and second 
5 signal (or subsequent signals). The signals may even be distorted or changed to some extent 
while still enabling accurate synchronisation due to the use of fingerprints. 

A fingerprint of a multimedia object/content/signal is a representation of 
perceptual features of the object/contenl/signal part in question. Such fingerprints are 
sometimes also known as "(robust) hashes". More specifically, a fingerprint of a piece of 
10 audio or video is an identifier which is computed over that piece of audio or video and which 
does not substantially change even if the content involved is subsequently transcoded, filtered 
or otherwise modified. 

Advantageous embodiments of the methods and devices according to the 
present invention are defined in the sub-claims. 
15 Further, the invention also relates to a computer readable medium having 

stored thereon instructions for causing one or more processing units to execute the method 
according to the present invention. 

20 Figure la schematically illustrates generation of fingerprint pair(s) to be used 

for synchronisation between an audio and a video signal; 

Figure lb schematically illustrates detection of such generated fingerprint 
pair(s) used for synchronisation according to the present invention. 

Figure 2 illustrates a schematic block diagram of a fingerprint generation 
25 device according to the present invention; 

Figure 3 illustrates a schematic block diagram of a synchronisation device 
detecting and using fingerprints according to the present invention; 

Figure 4 illustrates one example of tables/records according to the present 

invention; 

30 Figure 5 illustrates an alternative embodiment of a relationship between time 

points in a first and in a second signal. 

Figure 6 illustrates an embodiment where first and second representations are 
stored at a remote location; 
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Figure 7 illustrates schematically more specifically how the synchronisation in 
one embodiment may be done in a synchronisation device using buffers. 

5 Figure la schematically illustrates generation of fingerprint pair(s) to be used 

for synchronisation between an audio and a video signal. 

Shown are a digital or analog first signal 101 and a digital or analog second 
signal 103. In the following the first signal 101 is an audio signal and the second signal 103 is 
a video signal. 

10 At one or more synchronisation time points T n , T n+ i a fingerprint pair has to be 

derived. These time points are selected according to at least one predetermined criteria. E.g. 
criteria specifying a time point at the beginning of the audio and/or video signal, a time point 
at the end and a time point in-between. Alternatively, the time points may be selected 
according to: one at the beginning and one time point for each point after a given period of 

1 5 time have lapsed, e.g. one time point for every 2 minutes or every 2 seconds, etc. 

Alternatively, the time points may be derived from analysis of the underlying signal itself, 
e.g. at each scene change in a video signal. Just a single synchronisation time point T n , T n +i is 
needed in order to enable a synchronisation between the two signals 101, 103 according to 
the present invention. However, the use of more time points T n , T n +i enables a better 

20 synchronisation e.g. in a situation where one (or both) of the signals have been truncated, 

modified, etc. One example taking advantage of several time points could e.g. be when a user 
has recorded a movie and has bought the original soundtrack as described earlier but where 
the movie has been recorded with commercial breaks. By adding more synchronisation time 
points a better synchronisation is enabled, especially if the synchronisation time points is at 

25 or near the end time points of the commercial breaks. 

One audio fingerprint 102 is derived for each synchronisation time point T n , 
T n +i for the audio signal 101 and a video fingerprint 104 is derived for the video signal 103 at 
the same synchronisation time point(s) T n , T n +i resulting in a fingerprint pair 102, 104 for 
each synchronisation time point T n , T n +i. A fingerprint (for both audio and/or video) for a 

30 given time point T n , T n +i is preferably derived on a segment of the signal where the segment 
(substantially) starts at the given time point. Alternatively, the segment may end 
(substantially) at the given time point T n ; Tn+i, or the segment may start or end at a 
predetermined distance (substantially) before or after the given time point T n ; T^i, or the 
given time point T n ; T n +i may be at a predetermined time point between a start and an end of 



V 



PHNL03 088 8EPP 



5 25.07.2003 
the segment or any other scheme as long as the same scheme is applied during 
synchronisation to determine the given time point T n ; T n+ i on the basis of a fingerprint as will 
be explained in more detail in connection with Figure lb. 

The size of the fingerprints may both be of a predetermined fixed size or 
5 alternatively of a variable size. 

One method for computing a robust fingerprint is described in international 
patent application WO 02/065782 (attorney docket PHNL010110), although of course any 
method for computing a robust fingerprint can be used. 

European patent application 01200505.4 describes a method that generates 
10 robust fingerprints for multimedia content such as, for example, audio clips, where the audio 
clip is divided in successive (preferably overlapping) time intervals. For each time interval, 
the frequency spectrum is divided in bands. A robust property of each band (e.g. energy) is 
computed and represented by a respective fingerprint bit. 

Multimedia content is thus represented by a fingerprint comprising a 
15 concatenation of binary values, one for each time interval. The fingerprint does not need to 
be computed over the whole multimedia content, but can be computed when a portion of a 
certain length has been received. There can thus be plural fingerprints for one multimedia 
content, depending on which portion is used to compute the fingerprint over. 

Further, video fingerprinting algorithms are known, e.g. from the following 
20 disclosure: Job Oostveen, Ton Kalker, Jaap Haitsma: "Feature Extraction and a Database 
Strategy for Video Fingerprinting". 1 17-128. IN: Shi-Kuo Chang, Zhe Chen, Suh-Yin Lee 
(Eds.): Recent Advances in Visual Information Systems, 5th International Conference, 
VISUAL 2002 Hsin Chu, Taiwan, March 11-13, 2002, Proceedings. Lecture Notes in 
Computer Science 2314 Springer 2002. 
25 According to the present invention, an audio fingerprint 102 and a video 

fingerprint 104 are generated for each time point T n , T n+ i on the basis of a given segment of 
the audio signal 101 and a segment of the video signal 103 at or near the specific time point. 

In this way, a given fingerprint pair 102, 104 is a synchronisation marker 
enabling a very accurate and very precise location of a given time point of the signals 101 
30 and 103 without using the specific time point but instead using (a segment of) the signal. 
Further, this is enabled without changing the signals. Even for video fingerprinting the 
localisation is typically frame accurate, at least as long as any distortion of the video signal is 
not too severe. 
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After a fingerprint pair 102, 104 has been generated it is preferably stored for 
later use in a database, memory, storage and/or the like. 

There are several advantages in storing fingerprint pairs (102, 104 for 
multimedia signals 101, 103 in a database instead of the multimedia signals itself. To name a 
5 few: 

- The memory/storage requirements for the database are reduced. 

- The comparison of fingerprints is more efficient than the comparison of the 
multimedia signals themselves, as fingerprints are substantially shorter than the 
signals. 

10 - Searching in a database for a matching fingerprint is more efficient than searching for 

a complete multimedia signals, since it involves matching shorter items. 
Searching for a matching fingerprint is more likely to be successful, as small changes 
to a multimedia signal (such as encoding in a different format or changing the bit rate) 
do not affect the fingerprint. 
15 The generated fingerprint pairs 102, 104 stored in the database may then be 

distributed to one or more synchronisation devices (via the Internet or via other means) for 
synchronisation of the signals according to the present invention e.g. before playback, 
storage, further transmission of both (synchronised) signals, etc. 

Note that the invention is also applicable to synchronisation of more than two 
20 signals and also to signals being other types of signal than audio and video, as long as a 

robust fingerprint may be obtained. In principal any number of signals may be synchronised 
according to the present invention. This would simply require an additional fingerprint at 
each time point T n , T n +i for each additional signal. 

Alternatively, the fingerprint pair may also be generated at different time 
25 points for the respective signals, i.e. one fingerprint of the fingerprint pair may be generated 
e.g. at 25 seconds of the first signal while the other fingerprint may be generated e.g. at 30 
seconds of the second signal. However, this requires a well-defined relationship between with 
each respective time point (e.g. 25 seconds and 30 seconds in the above example) to a 
common time line/frame. This alternative embodiment will be described in greater detail in 
30 connection with Figure 5 . 

Figure lb schematically illustrates detection of such generated fingerprint 
pair(s) used for synchronisation according to the present invention. Shown are a digital or 
analog first (to-be-synchronised) signal 101 and a digital or analog second (to-be- 
synchronised) signal 103. In the following the first signal (101 is an audio signal and the 
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second signal 103 is a video signal. Further shown are a first fingerprint stream 105 and a 
second fingerprint stream 106 that are generated continuously or substantially continuously 
on the basis of the audio signal 101 and the video signal 103, respectively. Alternatively, the 
fingerprint streams 105, 106 are generated in segments. Each fingerprint stream 105, 106 (or 
segments) is compared with fingerprints 102, 104, e.g. stored in a database, in order to 
determine if there is a match or not. More specifically, the audio fingerprint stream 105 is 
compared with stored audio fingerprints 102 and the video fingerprint stream 106 is 
compared with stored video fingerprints 104. The stored fingerprints 102, 104 are generated 
as explained in connection with Figure la e.g. at a central location. The stored fingerprints 
102, 104 are e.g. received via the Internet or via some other means e.g. from the central 
location. 

When a match between a segment of the audio fingerprint stream 105 and a 
given audio fingerprint 102 in the database is found and a match between a segment of the 
video fingerprint stream 106 and a given video fingerprint 104 in the database is found, i.e. 
when a matching fingerprint pair have been found, the appropriate synchronisation time point 
T n ; T n +i is also given when the fingerprints 102, 104 have been generated according to the 
present invention and as explained in connection with Figure la. 

The specific synchronisation time point T n ; T n+ i is determined dependent on 
the scheme that has been used during generation of the audio fingerprint 102 and the video 
fingerprint 104 at that particular time point T n ; T n+ i . 

Preferably, the specific synchronisation time point T n ; T n+ i is given by letting 
the segment of the audio signal 101 and the segment of the video signal 103 that the 
matching fingerprint pair 102, 104 originally has been based on during generation (according 
to Figure la) starting (substantially) at the given time point T n ; T n+ i. In alternative 
embodiments, the segment of the audio signal 101 and the segment of the video signal 103 
end (substantially) at the given time point T n ; T^i, the segments of the audio and video 
signals 101, 103 are starting or ending at a predetermined distance before or after the given 
synchronisation time point T n ; T^i or the given synchronisation time point T n ; T n+X may be at 
a predetermined time point between a start and an end of the segments of the audio signal 
101 and the video signal 103. 

The synchronisation device simply needs to be aware of the relationship 
between a given fingerprint and the given time point used during generation, which may be 
determined and implemented during manufacture of the synchronisation device or 
alternatively be updatable. 
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As explained, after a matching fingerprint pair 102, 104 is determined, the 
time point T n ; T n +i of this pair is also know and serves as a synchronisation time point as this 
time point directly gives a reference point between the two signals 101 and 103. The 
synchronisation device then compensates for the delay (if any) between the two signals e.g. 
5 by shifting one of them so that they are aligned with respect to the time point. 

The above-mentioned international patent application WO 02/065782 
(attorney docket PHNL0101 10) describes various matching strategies for matching 
fingerprints computed for an audio clip with fingerprints stored in a database. One such 
method of matching a fingerprint representing an unknown information signal with a 

10 plurality of fingerprints of identified information signals stored in a database to identify the 
unknown signal uses reliability information of the extracted fingerprint bits. The fingerprint 
bits are determined by computing features of an information signal and thresholding said 
features to obtain the fingerprint bits. If a feature has a value very close to the threshold, a 
small change in the signal may lead to a fingerprint bit with opposite value. The absolute 

1 5 value of the difference between feature value and threshold is used to mark each fingerprint 
bit as reliable or unreliable. The reliabilities are subsequently used to improve the actual 

matching procedure. 

In this way, synchronisation may be obtained even though one of the signals 
e.g. the video signal, has been obtained in a lesser quality, has been modified (e.g. 

20 compressed), etc. 

Please note that the audio signal 101 and/or the video signal 103 may be a 
distorted version of the signal used during generation of the fingerprints, i.e. the signals of 
Figure la. 

As mentioned in connection with Figure la, this embodiment may easily be 
25 modified to accommodate synchronisation of more than two signals and/or signals of another 
type than audio and/or video. 

Figure 2 illustrates a schematic block diagram of a fingerprint generation 

device according to the present invention. 

Shown is fingerprint generation device 200 comprising a signal input module 
30 201, a fingerprinting module 202, a data base, memory storage and/or the like 203 

communicating via a bus 205 or the like under the control of one or more microprocessors 
(not shown). The fingerprint generation device 200 may in one embodiment optionally also 
comprise a transmitter and receiver 204 for communicating with other systems, devices, etc. 
via a wired and/or wireless network e.g. like the Internet. 
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The signal input module 201 receives a first 101 and at least a second 103 
signal. In the following two signals are received comprising multimedia content in the form 
of an analog or digital audio signal and a video signal. The input module 201 feeds the two 
signals to the fingerprinting module 202. The fingerprinting module 202 also receives a 
representation of the time points (. . . ,T n , T^i, . . .) that are to be used as synchronisation time 
points. Alternatively, the time points are derived by the fingerprint generation device 200. If 
the time points are supplied and not generated by the fingerprint generation device 200 then it 
is not necessary to supply the fingerprint generation device 200 with the complete audio 
signal 101 and complete video signal 103. It is then sufficient only to provide the respective 
segments of the audio signal 101 and video signal 103 that is used for the fingerprint 
generation, i.e. a segment of each signal for each time point. 

The transmitter and receiver 204 may also be responsible for receiving one or 
more of the signals 101 and 103 and supply il/them to the signal receiver 301 or directly to 
the fingerprint detector 302. 

The fingerprinting module 202 computes a fingerprint on the basis of the 
received audio 101 and video 103 signals. A fingerprint may be derived for the entire content 
or for a part of the content. Alternatively, several fingerprints may be derived each from a 
different part. According to the present invention, a fingerprint is derived for each time point 
T n? Tn+i, as explained in connection with Figure la. Alternatively, the fingerprinting module 
202 may be divided into or comprise two, e.g. distinct, fingerprint modules, one module for 
deriving audio fingerprints and one module for deriving video fingerprints. 

The fingerprinting module 202 then supplies the computed fingerprint pair(s) 
to the database 203. As shown in Figure 4, the database 203 comprises video fingerprints 
organised by one column comprising video fingerprints 104 'V_FP1% 'VJFP2', 6 VJFP3% 
'V_FP4% C VJFP5', etc. and corresponding audio fingerprints 102 'AJFP1', 6 AJFP2\ 
6 A_FP3% 'AJFP4', 'AJFP5% etc. 

The database 203 can be organised in various ways to optimise query time 
and/or data organisation. The output of the fingerprinting module 202 should be taken into 
account when designing the tables in the database 203. In the embodiment shown in Figure 4, 
the database 203 comprises a single table with entries (records) comprising respective 
fingerprint pairs. 

As mentioned, this exemplary embodiment may easily be modified to 
accommodate synchronisation of more than two signals and/or signals of another type than 
audio and/or video. 
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Figure 3 illustrates a schematic block diagram of a synchronisation device 
detecting and using fingerprints according to the present invention. 

Shown is a synchronisation device 300 comprising a signal receiver 301 5 a 
fingerprint detector 302, a synchronisation circuit 303, a database, memory storage and/or the 
5 like 203 communicating via a bus 205 or the like under the control of one or more 

microprocessors (not shown). The synchronisation device 300 may in one embodiment 
optionally also comprise a transmitter and receiver 204 for communicating with other 
systems, devices, etc. via a wired and/or wireless network e.g. like the Internet 

The signal receiver 301 receives a first 101 and at least a second 103 signal. In 
10 the following two signals are received comprising multimedia content in the form of an 

analog or digital audio signal and an analog or digital video signal to be synchronised. The 
transmitter and receiver 204 may also be responsible for receiving one or more of the signals 
101 and 103 and supply it/them to the signal receiver 301 or directly to the fingerprint 
detector 302. 

15 The received signals are feed to the fingerprint detector 302 that derives a 

fingerprint stream or segments thereof for each signal and determines if there are any 
matches with fingerprint pairs stored in the database 203 as explained in connection with 
Figure lb. If a match is found then the specific synchronisation time point T n ; T n +i for each 
signal are also determined. The specific determination of the synchronisation time point T n , 

20 Txh-i for each signal is dependent on the scheme that have been used during generation of the 
audio fingerprint 102 and the video fingerprint 104 at that particular time point T n ; T n +i . 

Preferably, the specific synchronisation time point T n ; T n +i is given by letting 
the segment of the audio signal 101 and the segment of the video signal 103 that the 
matching fingerprint pair 102, 104 originally has been based on during generation (according 

25 to Figure la) starting (substantially) at the given time point T n ; T n +i. In alternative 

embodiments, the segment of the audio signal 101 and the segment of the video signal 103 
end (substantially) at the given time point T n ; T n+ i, the segments of the audio and video 
signals 101, 103 are starting or ending at a predetermined distance before or after the given 
synchronisation time point T n ; Tv&i or the given synchronisation time point T n ; T n +i may be at 

30 a predetermined time point between a start and an end of the segments of the audio signal 

101 and the video signal 103. 

The synchronisation device simply needs to be aware of the relationship 
between a given fingerprint and the given time point used during generation, which may be 
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determined and implemented during manufacture of the synchronisation device or 
alternatively be updatable. 

As explained, after a matching fingerprint pair 102, 104 is determined, the 
time point T n ; T^i for each fingerprint of this pair is also known (although not necessary its 
value but only its location in both the audio and in the video signal) and serves as a 
synchronisation time point as these time points directly gives a reference point between the 
two signals 101 and 103. The synchronisation circuit 303 then compensates for the delay or 
offset (if any) between the two signals e.g. by shifting one of them or both so that they are 
aligned with respect to the synchronisation time point. 

As a simple example, say that a synchronisation time point is at 5 minutes and 
34 seconds of the signals 101 and 103 during generation according to Figure la. During the 
generation of fingerprints (according to Figure la) one audio fingerprint would be derived at 
or near (depending on the used scheme) 5 minutes and 34 seconds in the audio signal 101 and 
one video fingerprint would also be derived at or near (depending on the used scheme) 5 
minutes and 34 seconds in the video signal 103. These two fingerprints would then be stored 
and transmitted to a synchronisation device carrying out the synchronisation between the two 
signals. At the synchronisation device a fingerprint stream 105 of the audio signal and a 
fingerprint stream 106 of the video signal would be compared against the stored two 
fingerprints. When a match between the stored audio fingerprint and the audio fingerprint 
stream is found, then the location of the match (e.g. T n in 105 in Figure lb) in the fingerprint 
stream gives the used synchronisation time point, i.e. what should correspond 5 minutes and 
34 seconds. Likewise, when a match between the stored video fingerprint and the video 
fingerprint stream is found, then the location of the match (e.g. T n in 106 in Figure lb) in the 
fingerprint stream gives the used synchronisation time point, i.e. 5 minutes and 34 seconds. 
The two signals may be shifted, but the exact location in the signals (as given by the segment 
of the fingerprint stream that matches a stored fingerprint) of what should be 5 minutes and 
34 seconds may then be used to align the two signals. The specific value of the time point (5 
minutes and 34 seconds) does not even need to be known or derived specifically. The only 
knowledge needed is that the fingerprint matching location of the two signals 101; 103 
should be aligned/synchronised. The synchronisation time point of 5 minutes and 34 seconds 
may for example correspond to 5 minutes and 34 seconds in the audio signal (e.g. because 
this is the original sound track used during generation of the audio fingerprint) and to 6 
minutes and 3 seconds in the video signal (e.g. if the video signal further comprises 
commercial breaks compared to the "original video signal used during the generation of the 
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video fingerprint). The difference/offset between the two time- values (6 min. 3 sec. - 5 min. 
34 sec. = 29 sec.) may then be used to compensate for the delay, e.g. by shifting the playback 
so that both the audio signal and the video signal is played at the same time at the 
synchronisation time point and forward (if no further modifications of either signals is 
5 present, e.g. an additional commercial break, etc.). 

Preferably, the data layout of the database 203 corresponds to the one shown 

in Figure 4. 

As mentioned, this exemplary embodiment may easily be modified to 
accommodate synchronisation of more than two signals and/or signals of another type than 

1 0 audio and/or video. 

Figure 4 illustrates one example of tables/records according to the present 
invention. Shown is a table comprising fingerprint pairs 102, 104. The table is, in this 
example, organised by one column comprising video fingerprints 6 V_FP1 ', ' VJFP2', 
'VJFP3', 'VJFP4', fi V_FP5% etc. and one column comprsing the respective corresponding 

15 audio fingerprints 102 'A_JFPF, 'A_FP2', <A_FP3', 6 AJFP4% 'A_FP5', etc. 

Figure 5 illustrates an alternative embodiment of a relationship between time 
points in a first and in a second signal. Shown are a first signal 101 and a second signal 103. 
In this embodiment, a third or reference or common or internal time clock/line 107 (forth 
only denoted reference time line) is also shown to better explain the principle of this 

20 embodiment. 

In this particular example of the alternative embodiment, a fingerprint (not 
shown) has been generated for the first signal 101 at a first synchronisation time point T n 
having the value of 560. This particular time point T n for the first signal 101 is related to a 
reference time frame as indicated by the reference time line 107 (as indicated by an arrow) to 
25 a time point having a value of 8:45: 17.23 (indicating that the first signal at T n = 560 should 
be presented at 8:45:17.23) on the reference time line 107. A representation of this indication 
or relationship between the particular time point T n for the first signal 101 (i.e. a first 
representation) may be associated with the generated first fingerprint and stored in a database 
(e.g. the same or a different than the one containing the generated fingerprint) as will be 

30 explained later. 

Further, a fingerprint (not shown) has been generated for the second signal 103 
at a second synchronisation time point T m having the value of 1800. This particular time 
point T m for the second signal 103 is also related to the same reference time frame as 
indicated by the reference time line 107 (as indicated by an arrow) to a time point having a 
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value of 8:45:17.18 on the reference time line 107 (indicating that the second signal at T m = 
1800 should be presented at 8:45:17.18). A representation of this indication or relationship 
between the particular time point T m for the second signal 101 (i.e. a second representation) 
may be associated with the generated second fingerprint and stored in a database (e.g. the 
same or a different than the one containing the generated fingerprint) as will be explained 
later. 

The first and second representation may e.g. simply be the reference time 
points of the first and second signal, respectively. In the above example, the value 8:45:17.23 
would then be stored with the fingerprint generated at T n = 560 and the value 8:45: 17.1 8 
would then be stored with the fingerprint generated at T m = 1 800. 

During synchronisation, a synchronisation device according to this 
embodiment generates a first and a second fingerprint stream or fingerprint segments as 
explained in connection with Figure lb. Each fingerprint stream (or segments) is compared 
with fingerprints, e.g. stored in a local or remote database, in order to determine if there is a 
match or not, also as explained in connection with Figure lb. When a matching first and 
second fingerprint has been found then the first T n (i.e. 560 in the above example) and second 
synchronisation time points T m (i.e. 1800 in the above example) are also known or derivable. 
Then using the above-mentioned first and second representation of the relationship to a 
reference time frame it is possible to determine how the signals should be synchronised 
according to a given time frame. 

As mentioned the first and second representation may be stored in one or more 
databases and should be communicated to a synchronisation device before synchronisation. 
In one embodiment the first and second representations are communicated directly to the 
synchronisation device for storage from a fingerprint generation device. Alternatively, the 
first and second representations are communicated to another device, e.g. a server, capable of 
communicating with a synchronisation device. This embodiment will be explained in greater 
detail in connection with Figure 6. 

Figure 6 illustrates an embodiment where first and second representations are 
stored at a remote location. Shown are an audio server 601 and a video server 602 providing 
an audio stream and a video stream to an audio fingerprint generator 202 and a video 
fingerprint generator 202, respectively. The audio and video fingerprint generator 202 
functions as described in connection with Figure 2 and may be located in the same fingerprint 
generation device 200 or two different ones. In this embodiment, the generated fingerprints 
are supplied to a database 203 located at a (database) server 600 in communications 
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connection with a synchronisation device 300. The server 600 also receives and stores a first 
representation for each audio fingerprint and a second representation for each video 
fingerprint, as described e.g. in connection with Figure 5, i.e. the representations of the 
relationship between time points of the audio and video streams and a common reference 

5 time line or time frame. 

The synchronisation device 300 functions as described in connection e.g. with 
Figures 3 and 5. It receives the audio and video stream to be synchronised from the audio and 
video servers 601, 602 and generates a fingerprint stream or fingerprints segments of each 
and compares against predetermined fingerprints (corresponding to 102 and 104 of Figures 

10 la and lb) signifying synchronisation time points, as described earlier. The predetermined 
fingerprints may be received from the fingerprint generation device 200 (as indicated by two 
broken arrows) or from the server 600. If the predetermined fingerprints are received from 
the server 600 storage is saved in the synchronisation device 300, which may have a more 
limited storage capability. The first and second representation for each fingerprint pair is 

15 preferably also received from the server 600 and is used in order to synchronise the audio and 
video stream before playback as described in connection with Figure 5. 

The server(s) may have stored predetermined fingerprints and/or their 
associated first and second representations for several different audio and video streams. 

So, in one embodiment, the predetermined fingerprints are stored at the 

20 synchronisation device 200 while the first and second representations are stored at one or 
more servers 600. When a fingerprint pair has been detected, the first and second 
representations of that pair are transmitted from the server(s) and used in the synchronisation 
device 200. Alternatively, all the first and second representation of all predetermined 
fingerprints of a given audio and video streams may be supplied to the synchronisation 

25 device 200 before synchronisation is begun, e.g. based on stream DD(s), etc. 

In an alternative embodiment, the predetermined fingerprints along with their 
associated first and second representations are stored only at one or more servers 600. Prior 
to the synchronisation of the streams both the fingerprints and their associated first and 
second representations are transmitted to the synchronisation device 200 e.g. based stream ID 

30 or the like. Alternatively, only the fingerprints are transmitted before synchronisation are 
begun and upon detection of matching fingerprints the associated first and second 
representations are transmitted to the synchronisation device 200. 
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Please note that there will usually be a period time between the generation of 
fingerprints on the basis of the audio and video stream and when these streams are supplied 
to the synchronisation device 300. 

The database 203 may be a single database or several databases that may be 
located at a single or several servers. 

Figure 7 illustrates schematically more specifically how the synchronisation in 
one embodiment may be done in a synchronisation device using buffers. Illustrated are a 
buffer 701 for buffering audio data and a buffer 702 for buffering video data. For the audio 
buffer 701 an in-pointer I-P indicates where the next audio sample, arriving from the audio 
stream, is to be placed in the buffer. An out-pointer O-P indicates where the next audio 
sample is to be read. The out-pointer moves to the next slot at a pace set by a clock of the 
synchronisation device. 

For the video buffer 702 an in-pointer I-P and an out-pointer O-P are shown 
that function in the same way as for explained for the audio buffer 701 . 

Depending on a first representation (e.g. already present in the synchronisation 
device or received from a server as explained earlier), the out-pointer is adjusted, i.e. shifted 
to earlier or later slot in the buffer 701 . 

i 

Likewise also for a second representation for the video buffer 702. 

In this way, the out-pointers are adjusted on the basis of the first and second 
representations and thereby synchronise the out streams in a very simple way. 

In the claims, any reference signs placed between parentheses shall not be 
constructed as limiting the claim. The word "comprising" does not exclude the presence of 
elements or steps other than those listed in a claim. The word "a" or "an" preceding an 
element does not exclude the presence of a plurality of such elements. 

The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In the device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 
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CLAIMS: 



1 . A method of enabling synchronisation of a first and a second signal, the 

method comprising the steps of 

- deriving a first fingerprint (102) on the basis of a segment of the first signal (101), 
where the segment of the first signal (101) is unambiguously related with a first 

5 synchronisation time point (T n ; T n +i), 

- deriving a second fingerprint (104) on the basis of a segment of the second signal 
(103), where the segment of the second signal (103) is unambiguously related with a 
second synchronisation time point (T n ; T n+ i ;T m ), and 

- supplying the first and second fingerprints (102, 104) to a synchronisation device 
10 (200, 300). 

2. A method according to claim 1, characterized in that the method further 
comprises for each given synchronisation time point (T n ; T n +i ;T m ), storing the derived first 
fingerprint (102) in a database (203) and/or storing the derived second fingerprint (104) in the 

15 same or another database (203). 

3. A method according to claims 1-2, characterized in that the first fingerprint 
(102) and the second fingerprint (104) are transmitted to the synchronisation device (300) via 
the Internet or via other means. 

20 

4. A method according to claims 1-3, characterized in that the segment of the 
first signal (101) and/or the segment of the second signal (103) are unambiguously related 
with the first and/or second synchronisation time point (T n ; T n+ i;T m ) according to: 

- the segment of the first signal (101) and/or the segment of the second signal (103) 
25 ending substantially at the first and/or second synchronisation time point (T n ; 

T n +i jT m ), 

- the segment of the first signal (101) and/or the segment of the second signal (103) 
starting substantially at the first and/or second synchronisation time point (T n ; 
Tn+i ;T m ), 
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- the segment of the first signal (101) and/or the segment of the second signal (103) 
starting or ending at a predetermined distance before or after the first and/or second 
synchronisation time point (T n ; T^i;T m ), or 

- the first and/or second synchronisation time point (T n ; T n+ i ;T m ) being at a 

5 predetermined time point between a start and an end of the segment of the first signal 

(101) and/or the segment of the second signal (103). 



5. A method according to claims 1-4, characterized in that the first (T n ; T n+ i) 
and second synchronisation time point (T n ; T^Tm) is the same. 

10 

6. A method according to claims 1—4, characterized in that the first 
synchronisation time point (T n ; TVi) is different from the second synchronisation time point 
(T n ; TVi;T m ) and in that the method comprises the step of storing a first representation of a 
relationship between the first synchronisation time point (T n ; T^i) and a first time point of a 

15 reference time (107) and storing a second representation of a relationship between the second 
synchronisation time point (T n ; TVi;T m ) and a second time point of said reference time (107). 

7. A method according to claims 1-6, characterized in that the method further 
comprises the steps of: 

20 - transmitting the first and/or second representation to a synchronisation device (300), 

and/or 

- transmitting the first and/or second representation to a server (600) in 
communications connection with a synchronisation device (300), and/or 

- transmitting the one or more derived first fingerprints (102) and second fingerprints 
25 (104) to the server (600). 

8. A method of synchronising two or more signals, the method comprising the 
steps of: 

- generating a first fingerprint stream (105) on the basis of a first signal (101), 

30 - generating a second fingerprint stream (106) on the basis of a second signal (103), 

- comparing a segment of the first fingerprint stream (105) with one or more first 
fingerprints (102) stored in at least one database (203) in order to determine if a 
match exists or not, 
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- comparing a segment of the second fingerprint stream (106) with one or more second 
fingerprints (104) stored in the at least database (203) in order to determine if a match 
exists or not, and 

- if a match exists for both a first and a second fingerprint (1 02; 1 04) determi n i n g a 

5 location of a first synchronisation time point (T n , T n +i) for the first signal (101) and a 

location of a second synchronisation time point (T n , T n +i ;T m ) for the second signal 
(103) and synchronising the first (101) and the second (103) signal using the 
determined locations. 

10 9. A method according to claim 8, characterized in that the step of synchronising 

comprises: delaying either the first (101) or the second (103) signal by an amount equal to a 
difference, if any, between the location of the first synchronisation time point (T n3 T^i) for 
the first signal (101) and the location of the second synchronisation time point (T n , Tn+i;T m ) 
for the second signal (103). 

15 

10. A method according to claims 8-9, characterized in that the location of the 
first and/or the second synchronisation time point (T n , TVijTni) for the first/and the second 
signal (101, 103) are given by an unambiguous relation with a segment of a first signal (101) 
and/or a segment of a second signal (103) used during generation of the matching first 

20 fingerprint (102) and of the matching second fingerprint (104). 

11. A method according to claims 8-10, characterized in that the first and second 
synchronisation time point (T n ; T n +i;T m ) is the same. 

25 12. A method according to claims 8-10, characterized in that the first and second 

synchronisation time point (T n ; T n+ i;T m ) is different and in that the method further comprises: 

- if a match exists for both a first and a second fingerprint (102; 104) 

- obtaining a first representation of a relationship between the first synchronisation time 
point (T n ; T n +i) and a first time point of a reference time (107), 

30 - obtaining a second representation of a relationship between the second 

synchronisation time point (T n ; TVi ;T ra ) and a second time point of said reference 
time (107), and 

- using the first and second time points of said reference time (107) to synchronise the 
first (101) and the second signal (103), 
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- instead of 

- determining, if a match exists for both a first and a second fingerprint (102; 104), a 
location of a first synchronisation time point (T n , T^+i) for the first signal (101) and a 
location of a second synchronisation time point (T^ T^i ;T m ) for the second signal 

5 (103) and synchronising the first (101) and the second (103) signal using the 

determined locations . 

13. A method according to claim 12, characterized in that the method further 
comprises the steps of: 

10 - receiving the first and/or second representation in a synchronisation device (300) 

from a server (600) in communications connection with the synchronisation device 
(300), and/or 

- receiving the one or more first fingerprints (102) and second fingerprints (104) from 
the server (600). 

15 

14. A method according to claims 1 — 8 or claims 9 — 13, characterized in that said 
first signal (101) is an audio signal, said second signal (103) is a video signal, said first 
fingerprint (102) is an audio fingerprint, and said second fingerprint (104) is a video 
fingerprint. 

20 

15. A device (200) for synchronising at least two signals, the device comprising 
a fingerprint generator (202) adapted to 

- to derive a first fingerprint (102) on the basis of a segment of a first signal (101), 
where the segment of the first signal (101) is unambiguously related with a first 

25 synchronisation time point (T n ; T^f i), and 

- to derive a second fingerprint (104) on the basis of a segment of a second signal 
(103), where the segment of the second signal (103) is unambiguously related with a 
second synchronisation time point (T n ; T n +i;T m ). 

30 16. A device according to claim 15, characterized in that the device further 

comprises at least one database (203) having stored the derived first fingerprint (102) and/or 
the derived second fingerprint (104) for each given synchronisation time point (T n ; T n +i;T m ). 
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17. A device according to claims 15 - 16, characterized in that the device further 
comprises a transmitter (204) for transmitting the one or more derived first fingerprints (102) 
and second fingerprints (104) in the at least one database (203) to a synchronisation device 
(300) via the Internet or via other means. 

5 

18. A device according to claims 15-17, characterized in that the segment of the 
first signal (101) and/or the segment of the second signal (103) are unambiguously related 
with the first and/or second synchronisation time point (T n ; T n+ i ;T m ) according to: 

- the segment of the first signal (101) and/or the segment of the second signal (103) 
10 ending substantially at the first and/or second synchronisation time point (T n ; 

Tn-fl $T m ) 3 

- the segment of the first signal (101) and/or the segment of the second signal (103) 
starting substantially at the first and/or second synchronisation time point (T n ; 

15 - the segment of the first signal (101) and/or the segment of the second signal (103) 

starting or ending at a predetermined distance before or after the first and/or second 
synchronisation time point (T n ; Tn+i;T m ), or 

- the first and/or second synchronisation time point (T n ; Tn+i;T m ) being at a 
predetermined time point between a start and an end of the segment of the first signal 

20 (101) and/or the segment of the second signal (103). 



19. A device according to claims 15-18, characterized in that the first 
synchronisation time point (T n ; T n +i) and the second synchronisation time point (T n ; T n +i ;T m ) 
is the same. 

25 

20. A device according to claims 15 — 18, characterized in that the first 
synchronisation time point (T n ; T n +i) is different from the second synchronisation time point 
(T n ; Tn+i;T m ) and in that the device comprises the means adapted to store a first 
representation of a relationship between the first synchronisation time point (T n ; T n +i) and a 

30 first time point of a reference time (107) and store a second representation of a relationship 
between the second synchronisation time point (T n ; T^Tm) and a second time point of said 
reference time (107). 
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21 . A device according to claim 20, characterized in that the device further 
comprises: 

- a transmitter (204) for transmitting the first and/or second representation to a 
synchronisation device (300), and/or 

- a transmitter (204) for transmitting the first and/or second representation to a server 
(600) in communications connection with a synchronisation device (300), and/or 

- a transmitter (204) for transmitting the one or more derived first fingerprints (102) 
and second fingerprints (104) to the server (600). 

22. A synchronisation device (300) for synchronising two or more signals, the 
device comprising: 

- means (302) for generating a first fingerprint stream (105) on the basis of a first signal 
(101), 

- means (302) for generating a second fingerprint stream (106) on the basis of a second 
signal (103), 

- means (302) for comparing a segment of the first fingerprint stream (105) with one or 
more first fingerprints (102) stored in at least one database (203) in order to determine 
if a match exists or not, 

- means (302) for comparing a segment of the second fingerprint stream (106) with one 
or more second fingerprints (104) stored in the at least one database (203) in order to 
determine if a match exists or not, and 

- means (302) for, if a match exists for both a first and a second fingerprint (102; 104), 
deteimining a location of a first synchronisation time point (T n ; T^-i) for the first 
signal (101) and determining a location of a second synchronisation time point (T n ; 
Tn+i ;T m ) for the second signal (103) and means (303) for synchronising the first (101) 
and the second (103) signal using the determined locations. 

23. A device according to claim 22, characterized in that the means (303) for 
synchronising is adapted to: delay either the first (101) or the second (103) signal by an 
amount equal to a difference, if any, between the location of the synchronisation time point 
(T n ; Tn+i) for the first signal (101) and the location of the synchronisation time point (T n ; 
Tn+i;T m ) for the second signal (103). 
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24. A device according to claims 22 - 23 , characterized in that the location of the 
first and/or second synchronisation time point (T n ; TVi;T m ) for the first and/or second signal 
(101, 103) are given by an unambiguous relation with a segment of a first signal (101) and/or 
a segment of a second signal (103) used during generation of the matching first fingerprint 

5 (102) and of the matching second fingerprint (104). 

25. A device according to claims 22 - 24, characterized in that the first and second 
synchronisation time point (T n ; T n +i ;T m ) is the same. 



1 0 26. A device according to claims 22 - 25, characterized in that the first and second 

synchronisation time point (T n ; T^ijTm) is different and in that the device further comprises: 

- if a match exists for both a first and a second fingerprint (102; 104), 

- a receiver (204) for obtaining a first representation of a relationship between the first 
synchronisation time point (T n ; T n+ i) and a first time point of a reference time (107), 

15 - a receiver (204) for obtaining a second representation of a relationship between the 

second synchronisation time point (T n ; TWi;T m ) and a second time point of said 

reference time (107), and 

- synchronisation means (303) for using the first and second time points of said 
reference time (107) to synchronise the first (101) and the second signal (103), 

20 - instead of comprising 

- means (302) for, if a match exists for both a first and a second fingerprint (102; 104), 
determining a location of a first synchronisation time point (T n ; T n +i) for the first 
signal (101) and determining a location of a second synchronisation time point (T n ; 
T n +i;T m ) for the second signal (103) and means (303) for synchronising the first (101) 

25 and the second (103) signal using the determined locations. 

27. A device according to claim 26, characterized in that the device further 

comprises: 

- a receiver (204) for receiving the first and/or second representation in a 

30 synchronisation device (300) from a server (600) in communications connection with 

the synchronisation device (300), atid/or 

- a receiver (204) for receiving the one or more first fingerprints (102) and second 
fingerprints (104) from the server (600). 
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28. A device according to claims 15 - 21 or claims 22 - 27, characterized in that 
said first signal (101) is an audio signal, said second signal (103) is a video signal, said first 
fingerprint (102) is an audio fingerprint, and said second fingerprint (104) is a video 
fingerprint. 

29. A computer readable medium having stored thereon instructions for causing 
one or more processing units to execute the method according to any one of claims 1 - 8 or 
any one of claims 9 - 14. 
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ABSTRACT: 



This invention relates to a device and a method of generating a first and a 
second fingerprint usable for synchronisation of at least two signals and corresponding 
method and device for synchronising two or more signals. A fingerprint pair is generated on 
the basis of a segment of a first signal e.g. an audio signal and of a segment of a second 
5 signal e.g. a video signal at each synchronisation time point. The generated fingerprint pair(s) 
are stored in a database and communicated or distributed to a synchronisation device. During 
synchronisation, fingerprint(s) of the audio signal and fingerprint(s) of the video signal to be 
synchronised are generated and matched against fingerprints in the database. When a match 
is found, the fingerprints also determine the synchronisation time point, which is used to 
10 synchronise the two signals. 

In this way, a simple, reliable and efficient way of synchronising at least two 
signals is obtained. Further, this is enabled without modifying either the first and second 
signal (or subsequent signals). The signals may even be distorted or changed to some extent 
while still enabling synchronisation. 

15 

Figure lb 
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